Method for transforming a speech signal using a pitch manipulator

ABSTRACT

Transformation of a speech signal comprises separating the speech signal into two signal parts (a, b), where (a) represents the quasistationary part and (b) the transient part of the signal. The signal (b) is filtered inversely and is supplied in parallel to a transient detector and a pitch manipulator, while the signal (a) is subjected to a spectral analysis. The transformation circuit permits well-defined manipulation of any speech signal, which is advantageous partly for hearing-impaired persons, partly for persons having normal hearing ability in noisy environments. Finally, the circuit has been found to be extremely expedient for synthesizing well-defined sounds, which is of great importance in the control of hearing aids (hearing loss simulator).

BACKGROUND OF THE INVENTION

The invention concerns a method of transforming a speech signal which isseparated into two signal parts a, b, where a represents thequasistationary part of the signal with information on the formantfrequencies, and b represents a residual signal, the transient part ofthe signal, containing information on pitch frequency and stopconsonants, the signal b being produced by inverse filtration of thespeech signal.

Such a method is known from U.S. Pat. No. 5,060,258 and from articles byU. Hartmann, K. Hermansen and F. K. Fink: "Feature extraction forprofoundly deaf people", D.S.P. Group, Institute for Electronic Systems,Alborg University, September 1993, and by K. Hermansen, P. Rubak, U.Hartman and F. K. Fink: "Spectral sharpening of speech signals using thepartran tool", Alborg University.

As described in the above articles, a speech signal is divided into twosignal parts, one of which is described by a spectrum, and the other isa time signal. The spectral signal may be calculated on the basis of LPC(linear predictive coding), on the basis of FFT transformation or inanother manner. The spectrum produced by the analysis is divided into aplurality of second order parallel sections, and as disclosed by thearticles, the sections are characterized by three parameters, which arethe resonance frequency f_(o), the Q value ##EQU1## and the power of thespectral part which is about the frequency f_(o). With these threeparameters it is possible to transform (i.e. manipulate) the LPC or FFTspectrum. Further, this signal is typically composed of so-calledformants, which are resonance frequencies in the vocal tract, or putdifferently, the signal describes a considerable part of the informationcontent of a speech signal.

The second signal produced via an LPC analysis (inverse filtration) is aresidual signal which in respect of voiced sounds is indicative of thetone or pitch of a speech signal, which is typically in the range from100 to 300 Hz. For example, a male voice has a low frequency, while afemale voice has a somewhat higher value. The above-mentioned tonefrequencies or pitch frequencies are defined as the-number of pulses persecond which are generated by the vocal chords.

Now, by means of the two subsignals it is possible to manipulate speechsignals in several ways for use in many applications, as will appearfrom the following.

For example, transformation of speech signals of the above-mentionedtype may be used for:

a) Changing the sound picture with a view to improving the speechintelligibility in noisy environments for persons having normal as wellas impaired hearing ability.

b) Changing the sound picture with a view to improving the speechintelligibility and comfort of persons with severely impaired hearing.

c) Simulating hearing losses, e.g. for use in the testing of hearingaids.

As mentioned, according to the above-mentioned articles, the greatadvantage of the transformation of speech signals is that it is possiblemanipulate the formant frequencies as well as the residual signalindependently of each other. The fact is that if a complete speechsignal is compressed/expanded by more than 10% (for persons with normalhearing), the speech quality will be partially destroyed. Thisrestriction does not apply to the same extent, if the pitch signal ismaintained and the formant frequencies are reduced.

However, it has been found that the signal processing according to theabove-mentioned articles may be improved. If, for example, a door slams,a hearing-impaired person carrying a hearing aid of any type can easilyget an unpleasant surprise, because the circuit of the hearing aid isnot sufficiently fast to attenuate this sudden signal.

In the circuit mentioned in the articles above, a so-called soundtransient, such as e.g. the slam of a door, will substantially not bemodeled by the LPC analysis, but will occur in the residual signal as arather strong pulse.

Accordingly, it is the object of the invention to eliminate this noisesignal in the residual channel.

SUMMARY OF THE INVENTION

This object is obtained by a method of transforming a speech signal,comprising separating the speech signal into two signal parts a, b,where a represents the quasistationary part of the signal withinformation on the formant frequencies, and b represents a residualsignal with the transient part of the signal containing information onpitch-frequency and stop consonants, said signal b being produced byinverse filtration of the speech signal, characterized in that, afterthe inverse filtration, the signal b is supplied in parallel to atransient detector and a pitch manipulator comprising a delay circuitwhich is serially coupled to a multiplier to which the output signal issupplied from the transient detector.

Signal pulses are captured in this manner by the transient detector, andsince the signal to the multiplier is delayed with respect to the signalarriving from the transient detector, it is possible to eliminate thenoise pulse by means of the multiplier. Further, it is extremelyessential that the elimination of the noise pulse can take placecompletely independently of the signal processing in the other signalpart, which comprises manipulation of the formant frequencies.

The output signal from the multiplier is supplied to a pitch converter.The pitch frequencies may hereby be changed independently of the signalprocessing of the formant frequencies. This means that a voice, withoutany change it is characteristic contents, may be transformed to anotherpitch.

In some cases it may be expedient in noise/transient elimination thatthe transient detector is connected to an output from a spectralcalculation circuit having its input connected to the signal a, sincethis results in the incorporation of spectral information from the LPCanalysis.

Finally, it is expedient that the residual signal b, which containspitch frequency, sound transients, if any, and stop consonants, may bemanipulated independently of each other by means of the pitchmanipulator.

This is possible, because sound transient pulses, pitch pulses and stopconsonant pulses have a different appearance. In other words, e.g. anoise pulse which is eliminated, does not affect pitch frequency or stopconsonants.

Since the residual signal b i.a. contains pitch pulses, stop consonantsand noise transients, if any, as time sequential signal elements, thesedifferent signal elements may consequently be amplified/attenuatedindependently of each other. This is done by means of a multiplier,where the amplification factor (or attenuation factor) "is controlledby" a transient detector which classifies the various time sequentialsignal elements (pitch pulses, stop consonants, etc.). Owing to aninevitable delay in connection with the classification (see item B) ofthe various signal elements, a delay link has been added in front of themultiplier. Depending upon the classification, the multiplier isadjusted to an amplification factor of less than 1, equal to 1 orgreater than 1.

The classification of occurring transient signals in the residual signalb takes place on the basis of both the amplitude spectrum (frequencydomain) and the residual signal (time domain).

The frequency composition of the time signal segment concerned isdetermined. This is indicated in FIG. 7, where the transient detector 15receives information on the spectral composition from block 12(calculation of spectrum).

Pitch pulses and stop consonants may be distinguished from each other,as the stop consonants have considerably more signal power concentratedin the high frequency range (frequency domain).

Noise transients may be distinguished from the other signal elements bymeans of a simple level detector, as noise transients contain peakamplitudes (in the time domain, i.e. the residual signal b) which aremuch higher than those of the "speech sounds".

It is moreover possible in principle to use some very advanced patternrecognition methods which have been developed in connection with speechrecognition (e.g. classification based on cepstral coefficients).

When the strength-dynamic variation of the individual formants may becompressed in relation to the actual dynamic range of the hearingimpaired person, which depends on the frequency range in which theindividual formant is present, it is ensured that the strength variationof the "compressed formant" keeps within a range which is called UCL(uncomfortable level) and is downwardly limited by an increased hearingthreshold. (As a typical hearing loss increases toward higherfrequencies, the strength-dynamic compression must usually be increasedtoward higher frequencies). This strength compression just concerns the"a channel". In other words, the pitch signal in the residual channel isnot affected by strength compression, as is the case in conventionalanalog multi-channel compression hearing aids.

The invention also concerns an apparatus for transforming a speechsignal, comprising a circuit for splitting the signal into two parts a,b where the first part is supplied to a decomposition circuit in serieswith a transformation circuit, and the other b is supplied to a circuitfor inverse filtration. This apparatus is characterized in that theoutput from the circuit is connected in parallel to a transient detectorand a pitch manipulator comprising a series connection of delay circuitand a multiplier circuit to which the output from the transient detectoris connected.

The signal processing system of the invention is extremely usefulparticularly in connection with hearing aids, since it is possible tomanipulate signals to the hearing aid, as regards transformation offrequencies from one range to another as well as selective change of thestrength conditions. For example, it is frequently desirable totransform the high frequencies to a lower frequency range, since most ofthe hearing injuries occur at high frequencies. It is an advantage inthis connection that the signal information is substantially intact, sothat the hearing-impaired person will benefit from the information whichpersons of normal hearing ability receive in a wider frequency range. Asmentioned, it is also advantageous that noise pulses may be eliminated,since they can be very uncomfortable to the hearing-impaired persons.

As mentioned before, the spectrum (e.g. calculated via LPC or FFT) maybe decomposed/divided into a plurality of second order sections having aspecific centre frequency, bandwidth and strength.

The second order sections may be numbered according to increasing centrefrequency. The sections having odd numbers are phase-shifted 180 degreesto prevent destructive interference after the summation.

The first section (No. 1) is padded with a zero for z=-1. The lastsection is padded with a zero for z=+I. All the other sections arepadded with zeros at both z=-1 and z=+1.

LPC analysis is used for calculating the inverse filter, as mentionedbefore. The Q value of the zeros of the inverse filter may be adjustedadaptively via a factor alpha (typically 0.95-0.99), which is multipliedon all LPC coefficients. This adjustment is made in connection with thehandling of pure tone signals which can be very pronounced for somefemale voices (and children's voices).

The very flexible signal processing according to the invention alsoallows speech to be synthesized. This has many applications, and themost interesting one is perhaps that it is now possible to producesynthesized speech where all parameters are known, which is an advantageparticularly when testing hearing aids.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be explained more fully below with reference tothe drawing, in which

FIG. 1 shows a block diagram of a known signal transformation circuit,

FIG. 2 shows the principles in block diagram view of the signalprocessing in the circuit shown in FIG. 1,

FIG. 3 shows the spectral signal in one channel,

FIG. 4 shows the residual signal in the other channel,

FIG. 5 shows an output signal after processing in the transformationcircuit,

FIG. 6 shows an extended block diagram of the transformation circuitaccording to the invention,

FIG. 7 shows a detailed part of the pitch manipulator of FIG. 6 in blockdiagram view,

FIG. 8 shows an example of signal processing by means of the circuit ofFIGS. 6 and 7, and

FIG. 9 shows an example of the transformation principles according tothe invention.

DETAILED DESCRIPTION OF THE INVENTION

As will be seen from FIG. 1, which shows a block diagram of a circuitfor modifying a speech signal, the circuit consists of an analysis part1 which splits the signal into two parts, one part of which consists ofa decomposition part 2 and a transformation part 3 and is conducted inone branch, while the other part is a residual signal and is conductedin another branch, following which synthesis in the filter block 4 takesplace to provide a modified speech signal. It will moreover be seen thatthe input of the transformation part is connected to a storage 29 whichcontains personal data, e.g. information on measured UCL, cf. thefollowing, or on increased hearing threshold.

FIG. 2 shows more concretely how the two signal parts are processed,where one signal part designated a processes the quasistationary part ofthe signal in the block 5, which is then manipulated in the block 7,while the other signal part b processes the transient part in the block6, which may likewise be manipulated in the block 8, and the twomanipulated signals are coupled to modified speech signal. It is notedthat the signal a is produced by decomposing the speech signal in aspectrum which is arranged in second order units, more particularly theyare parallel-divided so that each part represents a formant frequencywhich is described by its power, its resonance frequency f_(o) and the Qvalue, ##EQU2##

As the signal is thus divided into parallel parts, it is now possible tomanipulate the individual parts on the basis of the above threeparameters. In other words, the signal a, which contains--information onthe contents of a speech signal, may be manipulated in a flexiblemanner. For example, it will be possible to sharpen the formantfrequencies by reducing the bandwidth. Of course, nothing prevents somefrequency bands from being omitted in the transformation. The other partof the speech signal b, the residual signal, includes the pitchfrequency, which in respect of voiced sounds is indicative of the tone,which is typically in the range from 100 to 300 Hz. In this part, thepitch frequency may be manipulated completely independently of theformant frequencies, which means that e.g. a male voice may betransformed to a child's voice without anything of the information inthe speech signal being lost. An example of signal processing in thecircuit mentioned above is shown in FIG. 3, which shows thequasistationary part of an LPC spectrum for the word "p.oslashed.lsevognen", without noise contamination. FIG. 4 shows theresidual signal for the same word, while FIG. 5 shows a spectrum afterit has passed through the circuit in FIGS. 1 and 2, the spectral partshaving been sharpened, or rather more clearly separated from each other.The signal processing in FIG. 5 has been performed by changing thebandwidth while maintaining the two other parameters, which are thepower in the spectrum and the resonance frequency.

The case shown in FIGS. 3-5 involved a noiseless signal, but preciselythe same might be performed in case of a noise contaminated signal. Insuch a case the noise would be reduced considerably, which may beutilized for eliminating noise for persons with impaired hearing abilityas well as with normal hearing ability.

FIG. 6 shows the transformation circuit of the invention. In the figure,9 is a microphone which transfers the speech signal from an analog todigital converter 10 and from there to a pre-emphasis filter 11. Thesignal is then passed into two blocks shown in dashed line, viz. theblocks 1, 2 which correspond to the blocks shown in FIG. 1, viz. theblock 1 forming the analysis part and the block 2 forming thedecomposition part. As will be seen, the block 2 consists of a circuit12 for calculating the spectrum of the speech signal, which is thenpassed into the block 13, in which the signal is pseudodecomposed bymeans of the circuit 13, which means that the signal is parallel-dividedand is described by means of the parameters resonance frequency fo, Qvalue and power P of the signal at the given resonance frequency. It isnoted that the calculation of the spectrum in the block 12 may beperformed on the basis of LPC coefficients, on the basis of FFTtransformation or optionally on the basis of PLP (perceptual linearprediction) calculation.

After the pseudo-decomposition in the circuit 13, the signal is passedto the transformation circuit 14 in which the spectrum is changed bymeans of the above-mentioned three parameters. Then, the output from thetransformation circuit is passed to a pulse response determining circuitfor the transformed filters as well as scaling of the pulse response.The signal is passed from the output of the pulse response circuit 16 toa synthesis filter. As will be seen from the drawing, the signal ispassed from the pre-emphasis filter 11 to an LPC circuit 17, whoseoutput is passed to an inverse filter circuit 19 having variablecoefficients based on LPC. A delay circuit 18, whose input receivessignals from the pre-emphasis circuit 11, is connected to another inputof the inverse filter 19. The output of the inverse filter 19 is passedto a pitch manipulator 20 to whose other input a transient detector 15is connected. Furthermore, as shown by the reference numeral 25, it ispossible to establish a connection from the spectral calculation circuit12 to the transient detector 15. The output of the pitch manipulator 20is passed to the synthesis filter 21, whose output is passed to apost-emphasis circuit 22, which is passed further on to a digital toanalog converter 23 and finally to a loudspeaker 24. As will be seenfrom FIG. 7, the pitch manipulator 20 consists of a delay circuit 26, amultiplier 27 and a pitch converter 28 intended to change the pitchfrequency.

As regards the quasistationary part of the signal, i.e. in the signal ain FIG. 2, the circuit of FIGS. 6 and 7 operate in the same manner asdescribed before and will therefore not be discussed more fully here. Onthe other hand, according to the invention, the signal processing in theresidual channel is different from the one described before. Toillustrate the signal processing in the residual channel reference ismade to FIG. 8 showing at I a time signal which consists of two pitchpulses p, a noise pulse si and a stop consonant sk. It is contemplatedthat this signal emerges from the inverse filter 19 and is supplied to atransient detector 15 and the delay circuit 26. As will be seen at I,the appearance of the pulses is different and thus possible to separate.For example, the transient detector is adapted such that on the basis ofthe amplitude of the noise pulse it detects said amplitude and signalsthe multiplier 27 to reduce its amplification, following which the samesignal is passed via the delay circuit 26 to the multiplier when theamplification thereof is reduced, which is shown at II below the noisepulse si at I. As regards the pitch pulses p shown on the time axis I,these are processed by means of the pitch converter 28, which forms partof the pitch manipulator 20. With respect to previously known signalprocessing methods, this is done in the residual signal, as alreadymentioned, which is of importance if it is desired to transform a voice,e.g. a child's voice to an adult's voice, without the contents of thespeech signal being changed. Finally, a stop consonant sk is shown onthe time axis. This stop consonant may be changed by means of themultiplier independently of the noise pulses si and the pitch pulses p,as the stop consonants may be identified by combining time domainanalysis in the residual signal with spectral information from the LPCanalysis. It is hereby possible to increase the amplification as long asthe stop consonant exists. The bottom line in FIG. 8 marked III showsthe result of the impact of the pitch manipulator on the pitch pulses,the noise transients and the stop consonants.

An example of the use of the transformation principles according to theinvention will be described below with reference to FIG. 9.

It is known that a large group of hearing losses is characterized inthat the hearing-impaired person has a greatly reduced dynamic range ofe.g. 20 dB. The normal dynamic range is about 120 dB. The maximum soundpressure caused discomfort is called UCL below and is of the order of120 dB. The normal hearing threshold is about 0 dB. In other words, agreat hearing loss is accompanied by a small dynamic range. If e.g. thehearing threshold is increased to 90 dB, the dynamic range will be120-90=30 dB. This dynamic range will additionally be reduced by about10 dB in connection with speech perception, as the speech level must beabout 10 dB above the hearing threshold for the speech perception to bereasonable. This means that the effective dynamic range is reduced toabout 20 dB in this case. The "inherent dynamic" of the actual speechsignal is of the same order. This should additionally be related to the`circumstance that the speech level varies considerably when-thedistance between the hearing-impaired person and the speaker concernedchanges. The speech level drops to about 6 dB, if the speaker moves from1 to 2 meters` distance to the hearing-impaired person.

It is moreover noted that the hearing loss greatly depends on frequency,and the hearing loss often increases toward higher frequencies, i.e. inmany cases hearing is relatively intact in the low frequency range of upto 1000 Hz. This means that the compensation for the reduced hearingloss must normally be frequency-dependent.

Generally, hearing loss compensation is based on the superior principlethat the formant frequencies must be located between the curve whichrepresents the individual UCL (uncomfortable level) and a curve which is2-10 dB above a specific hearing-impaired person's hearing thresholdmeasured individually. This range is called ITS below (individual targetspace). This superior principle ensures that as much as possible of thespeech can be heard by the individual hearing-impaired person.

This adaptation is made currently each time a new frequency spectrum hasbeen calculated. The system of the invention provides full control ofthe individual formants, and the system is therefore capable oftransforming the registered formants optimally above the individualhearing-impaired persons' ICS. The transformation circuit is moreoverflexible, because the necessary information on the formants is availablein a parametric form and additionally corresponds to an articulatorilynatural and correct representation.

It is important that the strength of the formants with respect to eachother may be changed with respect to the "natural" strengthdistribution. This must be seen in relation to the changed maskconditions for the hearing-impaired persons. A hearing loss curve with agreatly increasing hearing loss toward higher frequencies means e.g.that the lowest formant will easily mask the next-lowest formant.Therefore, it will usually be advantageous to establish amplification ofthe individual formant frequencies which increases toward higherfrequencies (seen in relation to the size of the hearing loss at theindividual formant frequencies).

A whispering voice is characterized i.a. in that the mutual strength ofthe various formants is changed with respect to a "normal voice".(Additionally, the pitch pulses are absent, the excitation taking placevia a turbulent flow of air). Further, it is an interesting observationthat it is often easier for hearing-impaired persons to understand awhispering voice which is amplified suitably (the dynamic of thewhispering voice better matches a typical high frequency hearing lossand the resulting changed mask conditions).

The circumstances surrounding the dynamic change of the strengthconditions are moreover very important. If the strength adaptation ofthe formants is made at a wrong pace, temporally, some important itemsof information on the speech signal modulation pattern are destroyed.This may be described by means of the concept modulation transferfunction, cf. technical Review, Bruel og Kjaer, no 2, 1985, called MTFbelow. It is very important that the speech modulation for modulationfrequencies in the range from about 0.5 Hz to 20 Hz is not distortednoticeably.

The general opinion is that a pronounced change in the modulationconditions, e.g. described by means of MTF, is the reason why analogmulti-channel compressing hearing aids apparently do not give anynoticeable improvement of the speech intelligibility in spite of thefact that the dynamic strength adaptation is considerably better than inconventional single channel hearing aids. Some more recent adaptationstrategies for hearing aid users thus also include optimization of theMTF conditions.

It is easy to control the time dynamic conditions in the transformationsystem of the invention. As described above, the strength of theformants must not be changed at a wrong pace, so that the modulationconditions of the speech are changed to an unacceptable degree. Anadvanced version of the transformation system allows the MTF conditionsto be included in connection with the current transformation of theformants above the individual user's ITS. The above-mentioned conditionsare illustrated in FIG. 9, where the graph 1 shows UCL, the graph 2shows formant structures, f1, f2, f3, where f2 and f3 will be raisedmore than f1 in terms of strength. The curve 3 shows the characteristicof a person having a typical high frequency hearing loss, while thegraph 4 shows the characteristic of a person having normal hearingability. The transformation circuit of the invention allows the formantfrequencies to be manipulated such that these will be between the curves1 and 3, thereby enabling a hearing-impaired person to perceive the sameor essentially the same information as a person having a normal hearingthreshold. It is noted that the above-mentioned signal processingprovides more possibilities of greater changes in the formantstructures, since the pitch frequency is not included, but may beadjusted completely independently.

We claim:
 1. A method of transforming a speech signal, comprisingseparating the speech signal into two signal parts a, b, where arepresents the quasistationary part of the signal with information onthe formant frequencies, and b represents a residual signal with thetransient part of the signal containing information on pitch frequencyand stop consonants, said signal b being produced by inverse filtrationof the speech signal, characterized in that, after the inversefiltration, the signal b is supplied in parallel to a transient detectorand a pitch manipulator comprising a delay circuit which is seriallycoupled to a multiplier to which the output signal is supplied from thetransient detector.
 2. A method according to claim 1, characterized inthat the multiplier is controlled by a control signal from the transientdetector and is adapted to preform time sequentialamplification/attenuation of the various signal elements.
 3. A methodaccording to claim 1 or 2, characterized in that the output signal fromthe multiplier is supplied to a pitch converter.
 4. A method accordingto claim 1, characterized in that the transient detector is connected toan output from a spectral calculation circuit whose input is connectedto the signal a.
 5. A method according to claim 1, characterized in thatthe residual signal b containing information on pitch frequency, soundtransients and stop consonants may be manipulated independently of eachother by means of the pitch manipulator.
 6. A method according to claim1, characterized in that strength-dynamic variation of the individualformants is compressed in relation to the hearing-impaired person'sactual dynamic range, which is frequency-dependent and depends on thefrequency range in which the individual format is present.
 7. Anapparatus for transforming a speech signal comprising a circuit forsplitting the signal into two signal parts a and b, a decompositioncircuit, a transformation circuit and an inverse filtering circuit, thefirst signal part a representing the quasistationary part of the signalwhich is supplied to said decomposition circuit whose output is suppliedto said transformation circuit, the second signal part b representingthe transient part of the speech signal which is produced in saidinverse filtering circuit, characterized by further comprising atransient detector and a pitch manipulator, the output from the inversefiltering circuit being supplied in parallel to said transient detectorand said pitch manipulator, said pitch manipulator comprising a seriesconnection of a delay circuit, a multiplier, and a pitch converter, theoutput signal from said transient detector being supplied to saidmultiplier.
 8. An apparatus according to claim 7, characterized in thatthe multiplier, which is controlled by the output signal from thetransient detector, provides a time sequential amplification so that thestop consonants are amplified, while the pitch pulses are transmittedwith the unchanged strength and the noise pulses are attenuated.
 9. Anapparatus according to claim 7, characterized in that the multiplier,which is controlled by the output signal from the transient detector,provides a time selective amplification so that the stop consonants areamplified, while pitch pulses are transmitted with the unchangedstrength and the noise pulses are attenuated.